-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gdt 54 aardvark transform #108
Conversation
Why these changes are being introduced: * This is the initial structure for the Aardvark transform class. The class will be expanded with new methods in subsequent commits. How this addresses that need: * Add jsonlines to Pipfile * Add fixtures for aardvark and generic JSONLines files * Update argument type hinting for Transformer and JsonTransformer classes to clarify expected content types * Update JsonTransformer.parse_source_file method to use jsonlines library * Add Aardvark class with get_main_titles, get_source_record_id, record_is_deleted (in progress), get_optional_fields (in progress), and get_subjects methods and corresponding unit tests Side effects of this change: * None Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/GDT-54
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I think it looks really good. I find it easy to follow the XML vs JSON transformer types. And while we may or may not move to more field-specific methods, I think breaking out complex ones like get_subjects
is a good approach (and consistent with other transformers).
Left some questions, which I think are largely correlated with how closely we want this new Aaardvark
transformer to be coupled with an "MITAardvark" record coming from GeoHarvester. Maybe that coupling is tightented through naming, maybe some is assumptions in data that will always be present. Did not explicitly approve or request changes, but curious your thoughts on that.
Otherwise, think looks like a great start and will setup nicely for the next fields to map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hiya' Eric! Thank you for scoping out the skeleton for the MITAardvark transformer. I had a couple of questions!
Is my understanding correct that this is intended to be a "scaffolding"-level PR with future PRs (under the same ticket GDT-54
) for the actual implementation? I plan to do a second pass after some clarification!
* Update json_records fixture to aardvark_records for more accurate unit tests * Rename Aardvark > MITAardvark to unify terminology across repos * Update get_main_titles method to reflect it is a required field * Update Aardvark method docstrings to provide greater context * Add Transformer._transform method to minimize code duplication between JsonTransformer and XmlTransformer methods
@ghukill @jonavellecuerdo Pushed updates, let me know what you think! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
I left a comment about the docstring for Transformer._transform()
, but it's just a suggestion, and not required. Approved with or without changes to that docstring.
Helpful background context
This is the first of several PRs to build out the
Aardvark
class with methods for each relevant TIMDEX field. I'm intending for this to be a template for the future PRs with examples of more complex methods (get_subjects
) and calls in theget_optional_fields
method for more straightforward values (languages
).How can a reviewer manually see the effects of these changes?
Updates to the CLI are needed before this can be tested manually.
What are the relevant tickets?
Developer
Code Reviewer
(not just this pull request message)
Includes new or updated dependencies?
YES